pacman::p_load(sf, terra, spatstat,
tmap, rvest, tidyverse)Hands on Exercise 2a- First-order Spatial Point Patterns Analysis Methods
1. Getting Started
1. Learning Outcome
Spatial Point Pattern Analysis (SPPA) is the evaluation of the pattern or distribution of a set of points on a surface. The points may represent:
events such as crimes, traffic accidents, or disease onsets, or
business services (e.g., coffee shops and fast-food outlets) or facilities such as childcare centres and eldercare centres.
First-order Spatial Point Pattern Analysis (1st-SPPA) focuses on understanding the intensity or density of points across a study area. It examines how the distribution of points varies over space, essentially identifying trends or patterns in point density. This type of analysis deals with the individual locations of points and their distribution, without considering interactions between them.
2. Data
Child Care Services data from data.gov.sg, a point feature data providing both location and attribute information of childcare centres.
Master Plan 2019 Subzone Boundary (No Sea), a polygon feature data providing information of URA 2019 Master Plan Planning Subzone boundary data.
3. Installing and Loading R Packages
4. Importing and Wrangling Geospatial Datasets
4.a. Import the master plan subzone boundary
mpsz_sf <- st_read("data/MasterPlan2019SubzoneBoundaryNoSeaKML.kml") %>%
st_zm(drop = TRUE, what = "ZM") %>% st_transform(crs = 3414)Reading layer `URA_MP19_SUBZONE_NO_SEA_PL' from data source
`C:\deepikarcodes\ISSS626-GAA\Hands_on_Exercises\Hands_on_Ex02\data\MasterPlan2019SubzoneBoundaryNoSeaKML.kml'
using driver `KML'
Simple feature collection with 332 features and 2 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 103.6057 ymin: 1.158699 xmax: 104.0885 ymax: 1.470775
Geodetic CRS: WGS 84
4.b. Extract the Key attributes and KML descriptions
extract_kml_field <- function(html_text, field_name) {
if (is.na(html_text) || html_text == "") return(NA_character_)
page <- read_html(html_text)
rows <- page %>% html_elements("tr")
value <- rows %>%
keep(~ html_text2(html_element(.x, "th")) == field_name) %>%
html_element("td") %>%
html_text2()
if (length(value) == 0) NA_character_ else value
}4.c. Clean the subzone and structure data
mpsz_sf <- mpsz_sf %>%
mutate(
REGION_N = map_chr(Description, extract_kml_field, "REGION_N"),
PLN_AREA_N = map_chr(Description, extract_kml_field, "PLN_AREA_N"),
SUBZONE_N = map_chr(Description, extract_kml_field, "SUBZONE_N"),
SUBZONE_C = map_chr(Description, extract_kml_field, "SUBZONE_C")
) %>%
select(-Name, -Description) %>%
relocate(geometry, .after = last_col())4.d. Filter out Non- Mainland Zones
mpsz_cl <- mpsz_sf %>%
filter(SUBZONE_N != "SOUTHERN GROUP",
PLN_AREA_N != "WESTERN ISLANDS",
PLN_AREA_N != "NORTH-EASTERN ISLANDS")4.e. Save the cleaned subzone
write_rds(mpsz_cl, "data/mpsz_cl.rds")5. Importing Childcare Services data
childcare_sf <- st_read("data/ChildCareServices.kml") %>%
st_zm(drop = TRUE, what = "ZM") %>%
st_transform(crs = 3414)Reading layer `CHILDCARE' from data source
`C:\deepikarcodes\ISSS626-GAA\Hands_on_Exercises\Hands_on_Ex02\data\ChildCareServices.kml'
using driver `KML'
Simple feature collection with 1925 features and 2 fields
Geometry type: POINT
Dimension: XYZ
Bounding box: xmin: 103.6878 ymin: 1.247759 xmax: 103.9897 ymax: 1.462134
z_range: zmin: 0 zmax: 0
Geodetic CRS: WGS 84
5.a. Validate CRS match
st_crs(mpsz_cl) == st_crs(childcare_sf)[1] TRUE
6. Mapping the geospatial data sets
6.a. Do it Yourself!
library(sf)
library(ggplot2)
library(readr)
# Load cleaned subzone and childcare datasets
mpsz_cl <- read_rds("data/mpsz_cl.rds")
childcare_sf <- st_read("data/ChildCareServices.kml") %>%
st_zm(drop = TRUE, what = "ZM") %>%
st_transform(crs = st_crs(mpsz_cl)) # Ensure same CRSReading layer `CHILDCARE' from data source
`C:\deepikarcodes\ISSS626-GAA\Hands_on_Exercises\Hands_on_Ex02\data\ChildCareServices.kml'
using driver `KML'
Simple feature collection with 1925 features and 2 fields
Geometry type: POINT
Dimension: XYZ
Bounding box: xmin: 103.6878 ymin: 1.247759 xmax: 103.9897 ymax: 1.462134
z_range: zmin: 0 zmax: 0
Geodetic CRS: WGS 84
# Create the map
ggplot() +
geom_sf(data = mpsz_cl, fill = "gray", color = "black", size = 0.2) +
geom_sf(data = childcare_sf, color = "black", size = 1.2, alpha = 0.7) +
labs(title = "Childcare Services Across Singapore Subzones",
subtitle = "All layers aligned to EPSG:3414 (SVY21)",
caption = "Source: URA Master Plan & Data.gov.sg") +
theme_minimal()
6.b. Interactive mapping with tmap()
tmap_mode("view")tm_shape(childcare_sf) +
tm_dots()6.c. Let’s switch back to plot
tmap_mode("plot")7. Geospatial Data Wrangling
7.a. Converting sf data frames to ppp class
spatstat requires the point event data in ppp object form. The code chunk below uses [as.ppp()] of spatstat package to convert childcare_sf to ppp format.
childcare_ppp <- as.ppp(childcare_sf)class(childcare_ppp)[1] "ppp"
summary(childcare_ppp)Marked planar point pattern: 1925 points
Average intensity 2.417323e-06 points per square unit
Coordinates are given to 11 decimal places
Mark variables: Name, Description
Summary:
Name Description
Length:1925 Length:1925
Class :character Class :character
Mode :character Mode :character
Window: rectangle = [11810.03, 45404.24] x [25596.33, 49300.88] units
(33590 x 23700 units)
Window area = 796335000 square units
7.b. Creating owin object
it is a good practice to confine the analysis with a geographical area like Singapore boundary. In spatstat, an object called owin is specially designed to represent this polygonal region.
The code chunk below, as.owin() of spatstat is used to covert mpsz_sf into owin object of spatstat.
sg_owin <- as.owin(mpsz_cl)class(sg_owin)[1] "owin"
plot(sg_owin)
7.c. Combining point events object and owin object
Extraction the childcare events that are located within Singapore by using the code chunk below.
childcareSG_ppp = childcare_ppp[sg_owin]The output object combined both the point and polygon feature in one ppp object class as shown below.
childcareSG_pppMarked planar point pattern: 1925 points
Mark variables: Name, Description
window: polygonal boundary
enclosing rectangle: [2667.54, 55941.94] x [21448.47, 50256.33] units
8. Clark-Evan Test for Nearest Neighbour Analysis
Nearest Neighbor Analysis (NNA) is a spatial statistics method that calculates the average distance between each point and its closest neighbor to determine if a pattern of points is clustered, dispersed, or randomly distributed.
Clark-Evans test is a specific statistical method used within NNA to quantify whether a point pattern is clustered, random, or uniformly spaced, using the Clark-Evans aggregation index (R) to describe this pattern. NNA provides a numerical value that describes the degree of clustering or regularity, and the Clark-Evans test calculates a specific index (R) for this purpose.
8.a. Perform Clark Evans Test without CSR
library(spatstat.explore)clarkevans.test(childcareSG_ppp,
correction="none",
clipregion="sg_owin",
alternative=c("clustered"))
Clark-Evans test
No edge correction
Z-test
data: childcareSG_ppp
R = 0.53532, p-value < 2.2e-16
alternative hypothesis: clustered (R < 1)
Statistical Conclusion – The Clark-Evans test produced an observed R value of 0.53532 with a p-value smaller than 2.2e-16, indicating strong statistical support that childcare services follow a clustered distribution rather than a random one.
Business Communication – The analysis suggests that childcare facilities in Singapore tend to be grouped in particular locations instead of being evenly or randomly spread out. Such clustering could point to an excess of services in some neighborhoods while leaving other areas lacking. This insight signals to urban planners, policymakers, and providers the importance of reevaluating site selection to ensure more equal access to childcare options across the city.
8.b. Clark Exans Test with CSR
clarkevans.test(childcareSG_ppp,
correction="none",
clipregion="sg_owin",
alternative=c("clustered"),
method="MonteCarlo",
nsim=99)
Clark-Evans test
No edge correction
Monte Carlo test based on 99 simulations of CSR with fixed n
data: childcareSG_ppp
R = 0.53532, p-value = 0.01
alternative hypothesis: clustered (R < 1)
Statistical Conclusion – The Clark-Evans test with Monte Carlo simulation produced an R value of 0.53532 and a p-value of 0.01. These results strongly reject the null hypothesis of randomness, confirming that the spatial arrangement of childcare services is highly clustered.
Business Conclusion – The analysis shows that childcare centers in Singapore are concentrated in certain subzones rather than being spread evenly. Such clustering suggests that some areas may have more facilities than needed, while others face shortages. For policymakers, urban planners, and providers, this finding emphasizes the importance of revisiting location planning to promote fairer access. Expanding services into underserved neighborhoods could enhance both coverage and community well-being.
9. Kernel Density Estimation Method
Kernel Density Estimation (KDE) is a valuable tool for visualising and analyzing first-order spatial point patterns. It is widely considered a method within Exploratory Spatial Data Analysis (ESDA) because it’s used to visualize and understand spatial data patterns by transforms discrete point data (like locations of childcare service, crime incidents or disease cases) into continuous density surfaces that reveal clusters and variations in event occurrences, without making prior assumptions about data distribution. It helps to begin understanding data distribution, identify hotspots, and explore relationships between spatial variables before performing more rigorous analysis.
9.a. Automatic Bandwidth Selection
kde_SG_diggle <- density(
childcareSG_ppp,
sigma = bw.diggle,
edge = TRUE,
kernel = "gaussian"
)plot(kde_SG_diggle)
summary(kde_SG_diggle)real-valued pixel image
128 x 128 pixel array (ny, nx)
enclosing rectangle: [2667.538, 55941.94] x [21448.47, 50256.33] units
dimensions of each pixel: 416 x 225.0614 units
Image is defined on a subset of the rectangular grid
Subset area = 669941961.12249 square units
Subset area fraction = 0.437
Pixel values (inside window):
range = [-6.584123e-21, 3.063698e-05]
integral = 1927.788
mean = 2.877545e-06
9.b. Retrieve the bandwidth value
bw <- bw.diggle(childcareSG_ppp)
bw sigma
295.9712
9.c. Rescalling the KDE Values
childcareSG_ppp_km <- rescale.ppp(
childcareSG_ppp, 1000, "km")9.d. With density() -
kde_childcareSG_km <- density(childcareSG_ppp_km,
sigma=bw.diggle,
edge=TRUE,
kernel="gaussian")plot(kde_childcareSG_km)
9.e. Working with different automatic badwidth methods
Cross-validation based; tends to be broader.
bw.CvL(childcareSG_ppp_km) sigma
4.357209
9.f. Based on Scott’s rule; anisotropic:
bw.scott(childcareSG_ppp_km) sigma.x sigma.y
2.159749 1.396455
9.g. Let’s try for tighter cluster:
bw.ppl(childcareSG_ppp_km) sigma
0.378997
9.h. Ideal way to detect single tight clusters
bw.diggle(childcareSG_ppp_km) sigma
0.2959712
9.i. Computing KDE with bw.ppl() and comparing -
kde_childcareSG.ppl <- density(childcareSG_ppp_km,
sigma=bw.ppl,
edge=TRUE,
kernel="gaussian")
par(mfrow=c(1,2))
plot(kde_childcareSG_km, main = "bw.diggle")
plot(kde_childcareSG.ppl, main = "bw.ppl")
10. Working with different kernel methods
par(mfrow=c(2,2))
plot(density(childcareSG_ppp_km,
sigma=0.2959712,
edge=TRUE,
kernel="gaussian"),
main="Gaussian")
plot(density(childcareSG_ppp_km,
sigma=0.2959712,
edge=TRUE,
kernel="epanechnikov"),
main="Epanechnikov")
plot(density(childcareSG_ppp_km,
sigma=0.2959712,
edge=TRUE,
kernel="quartic"),
main="Quartic")
plot(density(childcareSG_ppp_km,
sigma=0.2959712,
edge=TRUE,
kernel="disc"),
main="Disc")
11.Fixed and Adaptive KDE
11.a. Computing KDE by using fixed bandwidth
kde_childcareSG_fb <- density(childcareSG_ppp_km,
sigma=0.6,
edge=TRUE,
kernel="gaussian")
plot(kde_childcareSG_fb)
11.b. Computing KDE by using adaptive bandwidth
kde_childcareSG_ab <- adaptive.density(
childcareSG_ppp_km,
method="kernel")
plot(kde_childcareSG_ab)
12. Plotting cartographic quality KDE map
12.a. Converting gridded output into raster
kde_childcareSG_bw_terra <- rast(kde_childcareSG_km)class(kde_childcareSG_bw_terra)[1] "SpatRaster"
attr(,"package")
[1] "terra"
kde_childcareSG_bw_terraclass : SpatRaster
size : 128, 128, 1 (nrow, ncol, nlyr)
resolution : 0.4162063, 0.2250614 (x, y)
extent : 2.667538, 55.94194, 21.44847, 50.25633 (xmin, xmax, ymin, ymax)
coord. ref. :
source(s) : memory
name : lyr.1
min value : -5.824417e-15
max value : 3.063698e+01
unit : km
12.b Assigning projection systems
crs(kde_childcareSG_bw_terra) <- "EPSG:3414"12.c. Properties of kde_childcareSG_bw_raster RasterLayer:
kde_childcareSG_bw_terraclass : SpatRaster
size : 128, 128, 1 (nrow, ncol, nlyr)
resolution : 0.4162063, 0.2250614 (x, y)
extent : 2.667538, 55.94194, 21.44847, 50.25633 (xmin, xmax, ymin, ymax)
coord. ref. : SVY21 / Singapore TM (EPSG:3414)
source(s) : memory
name : lyr.1
min value : -5.824417e-15
max value : 3.063698e+01
unit : km
12.d Plotting KDE map with tmap
Let’s display the raster in cartographic quality map using tmap package:
tm_shape(kde_childcareSG_bw_terra) +
tm_raster(col.scale =
tm_scale_continuous(
values = "viridis"),
col.legend = tm_legend(
title = "Density values",
title.size = 0.7,
text.size = 0.7,
bg.color = "white",
bg.alpha = 0.7,
position = tm_pos_in(
"right", "bottom"),
frame = TRUE)) +
tm_graticules(labels.size = 0.7) +
tm_compass() +
tm_layout(scale = 1.0)
13. First Order SPPA at the Planning Subzone Level
13.a Geospatial data wrangling
13.a.1. The code segment presented below demonstrates the extraction of target planning areas within the study area.
pg <- mpsz_cl %>%
filter(PLN_AREA_N == "PUNGGOL")
tm <- mpsz_cl %>%
filter(PLN_AREA_N == "TAMPINES")
ck <- mpsz_cl %>%
filter(PLN_AREA_N == "CHOA CHU KANG")
jw <- mpsz_cl %>%
filter(PLN_AREA_N == "JURONG WEST")It is always a good practice to review the extracted areas. The code chunk below will be used to plot the extracted planning areas.
par(mfrow=c(2,2))
plot(st_geometry(pg), main = "Ponggol")
plot(st_geometry(tm), main = "Tampines")
plot(st_geometry(ck), main = "Choa Chu Kang")
plot(st_geometry(jw), main = "Jurong West")
13.a.2 Next, we will convert the sf objects into owin objects, which are the required format for performing spatial analyses in the spatstat package:
pg_owin = as.owin(pg)
tm_owin = as.owin(tm)
ck_owin = as.owin(ck)
jw_owin = as.owin(jw)13.a.3 Combining point events object and owin object:
childcare_pg_ppp = childcare_ppp[pg_owin]
childcare_tm_ppp = childcare_ppp[tm_owin]
childcare_ck_ppp = childcare_ppp[ck_owin]
childcare_jw_ppp = childcare_ppp[jw_owin]13.b Next, rescale.ppp() function is used to transform the unit of measurement from metre to kilometre:
childcare_pg_ppp.km = rescale.ppp(childcare_pg_ppp, 1000, "km")
childcare_tm_ppp.km = rescale.ppp(childcare_tm_ppp, 1000, "km")
childcare_ck_ppp.km = rescale.ppp(childcare_ck_ppp, 1000, "km")
childcare_jw_ppp.km = rescale.ppp(childcare_jw_ppp, 1000, "km")13.c. The code chunk below is used to plot these four study areas and the locations of the childcare centres.
par(mfrow=c(2,2))
plot(unmark(childcare_pg_ppp.km),
main="Punggol")
plot(unmark(childcare_tm_ppp.km),
main="Tampines")
plot(unmark(childcare_ck_ppp.km),
main="Choa Chu Kang")
plot(unmark(childcare_jw_ppp.km),
main="Jurong West")
13.d. Clark and Evans Test
13.d.1 Choa Chu Kang planning area In the code chunk below, clarkevans.test() of spatstat is used to performs Clark-Evans test of aggregation for childcare centre in Choa Chu Kang planning area.
clarkevans.test(childcare_ck_ppp,
correction="none",
clipregion=NULL,
alternative=c("two.sided"),
nsim=999)
Clark-Evans test
No edge correction
Z-test
data: childcare_ck_ppp
R = 0.84097, p-value = 0.008866
alternative hypothesis: two-sided
13.c. Computing KDE surfaces by planning area
The code chunk below will be used to compute the KDE of these four planning area. bw.diggle method is used to derive the bandwidth of each:
par(mfrow=c(2,2))
plot(density(childcare_pg_ppp.km,
sigma=bw.diggle,
edge=TRUE,
kernel="gaussian"),
main="Punggol")
plot(density(childcare_tm_ppp.km,
sigma=bw.diggle,
edge=TRUE,
kernel="gaussian"),
main="Tempines")
plot(density(childcare_ck_ppp.km,
sigma=bw.diggle,
edge=TRUE,
kernel="gaussian"),
main="Choa Chu Kang")
plot(density(childcare_jw_ppp.km,
sigma=bw.diggle,
edge=TRUE,
kernel="gaussian"),
main="Jurong West")